Partitioning: Use more consistent language regarding partitions and buckets#853
Partitioning: Use more consistent language regarding partitions and buckets#853RasmusRendal wants to merge 1 commit intonats-io:masterfrom
Conversation
…uckets In the current state, a partition consists of partitions, which is quite confusing. This PR changes that, such that a partition consists of buckets.
| Deterministic token partitioning allows you to use subject-based addressing to deterministically divide (partition) a flow of messages where one or more of the subject tokens is mapped into a partition key. Deterministically means, the same tokens are always mapped into the same key. The mapping will appear random and may not be `fair` for a small number of subjects. | ||
|
|
||
| For example: new customer orders are published on `neworders.<customer id>`, you can partition those messages over 3 partition numbers (buckets), using the `partition(number of partitions, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`. | ||
| For example: new customer orders are published on `neworders.<customer id>`, you can partition those messages over 3 partition numbers (buckets), using the `partition(number of buckets, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`. |
There was a problem hiding this comment.
Hi @RasmusRendal, thanks for the contribution. Now that you pointed this discrepancy out, I think removing the term bucket generally would make more sense and stick with partition since it is redundant.
For example:
| For example: new customer orders are published on `neworders.<customer id>`, you can partition those messages over 3 partition numbers (buckets), using the `partition(number of buckets, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`. | |
| For example: new customer orders are published on `neworders.<customer id>`, you can spread those messages over 3 partition, using the `partition(number of partitions, wildcard token positions...)` function which returns a partition number (between 0 and number of partitions-1) by using the following mapping `"neworders.*" : "neworders.{{wildcard(1)}}.{{partition(3,1)}}"`. |
Does this make sense?
There was a problem hiding this comment.
I think calling the things the partition consists of something different than "partition" is still important, if not for making the documentation understandable, to make it easier to talk about NATS. I want to be able to tell a colleague that "We create a partition of our subject, and each bucket/part is handled by a separate consumer".
This is also how people talk about partitions in other contexts: https://en.wikipedia.org/wiki/Partition_of_a_set#Definition_and_notation
The sets in
$P$ are called the blocks, parts, or cells, of the partition.
In the current state, a partition consists of partitions, which is quite confusing. This PR changes that, such that a partition consists of buckets.